Very Predictive Ngrams for Space-Limited Probabilistic Models

نویسندگان

  • Paul R. Cohen
  • Charles A. Sutton
چکیده

In sequential prediction tasks, one repeatedly tries to predict the next element in a sequence. A classical way to solve these problems is to fit an order-n Markov model to the data, but fixed-order models are often bigger than they need to be. In a fixed-order model, all predictors are of length n, even if a shorter predictor would work just as well. We present a greedy algorithm, vpr, for finding variable-length predictive rules. Although vpr is not optimal, we show that on English text, it performs similarly to fixed-order models but uses fewer parameters.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accounting ngrams and multi-word terms can improve topic models

The paper presents an empirical study of integrating ngrams and multi-word terms into topic models, while maintaining similarities between them and words based on their component structure. First, we adapt the PLSA-SIM algorithm to the more widespread LDA model and ngrams. Then we propose a novel algorithm LDA-ITER that allows the incorporation of the most suitable ngrams into topic models. The...

متن کامل

مدل ترکیبی تحلیل مؤلفه اصلی احتمالاتی بانظارت در چارچوب کاهش بعد بدون اتلاف برای شناسایی چهره

In this paper, we first proposed the supervised version of probabilistic principal component analysis mixture model. Then, we consider a learning predictive model with projection penalties, as an approach for dimensionality reduction without loss of information for face recognition. In the proposed method, first a local linear underlying manifold of data samples is obtained using the supervised...

متن کامل

Performance Evaluation of Dynamic Modulus Predictive Models for Asphalt Mixtures

Dynamic modulus characterizes the viscoelastic behavior of asphalt materials and is the most important input parameter for design and rehabilitation of flexible pavements using Mechanistic–Empirical Pavement Design Guide (MEPDG). Laboratory determination of dynamic modulus is very expensive and time consuming. To overcome this challenge, several predictive models were developed to determine dyn...

متن کامل

Factored Models for Probabilistic Modal Logic

Modal logic represents knowledge that agents have about other agents’ knowledge. Probabilistic modal logic further captures probabilistic beliefs about probabilistic beliefs. Models in those logics are useful for understanding and decision making in conversations, bargaining situations, and competitions. Unfortunately, probabilistic modal structures are impractical for large real-world applicat...

متن کامل

Occurrence Based Statistics in Machine Translation

As MT approaches demand longer context for better translation quality, the limitations of current language modeling techniques become explicit. The computational inability to model the likelihood of longer ngrams and the likelihood of their usage in probabilistic manner, have prevented us from exploring long ngrams in MT. In this paper, we propose and investigate a new set of features called oc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003